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Abstract 



Confidence bands are confidence sets for an unknown function /, 
containing all functions witliin some sup-norm distance of an estima- 
tor. In tlie density estimation, regression, and wliite noise models, we 
consider tlie problem of constructing adaptive confidence bands, whose 
width contracts at an optimal rate over a range of Holder classes. 

While adaptive estimators exist, in general adaptive confidence 
bands do not, and to proceed we must place further conditions on /. 
We discuss previous approaches to this issue, and show it is necessary 
to restrict / to fundamentally smaller classes of functions. 

We then consider the self-similar functions, whose Holder norm is 
similar at large and small scales. We show that such functions may 
be considered typical functions of a given Holder class, and that the 
assumption of self-similarity is both necessary and sufficient for the 
construction of adaptive bands. Finally, we show that this assumption 
allows us to resolve the problem of undersmoothing, creating bands 
which are honest simultaneously for functions of any Holder norm. 

1 Introduction 

Suppose we have an unknown function / : [0, 1] — ^ M we wish to estimate. 
Our data may come from: 

(i) density estimation, where / is a density on [0, 1], and we observe 

Ai, . . . , A„ ~ /, 
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(ii) fixed design regression, where we observe 

Yi:= f{xi)+ei, ei'~-iV(0,c72), 
for Xi := i/n, i = 1, . . . , n; or 

(iii) white noise, where we observe the process 

Yt := [' f{s)ds + n-^/^Bt, 
Jo 

for a standard Brownian motion B. 

The performance of an estimator /„ depends on the smoothness of the 
function /. In the fohowing, we will measure performance by the L°° loss, 
Wfn - /lloo; where ||/||^ := sup3,g[o,i]l/(a^)|- loss is the hardest of the LP 
loss functions to estimate under, but provides intuitive risk bounds, simulta- 
neously describing local and global performance. If the function / is known 
to lie in the smoothness class C^{M) of functions with s-Holder norm at 
most M, 

C'{M) := | / e C{[0, 1]) : / has k := [s] - 1 derivatives, 

II/IL.....II/<"ILSM, snp I/"'W-/^MI <m|. 

x,y€[0,l] \x-y\ ) 

then the L°° minimax rate of estimation, 

inf sup E^||/„-/||^, 

/n /GC=(A/) 

decays like (n/ log n)~^/(2s+i) (g^g Tsybakov, 2009). 

The simplest estimators attaining this rate depend on the quantities 
s and M, which in practise we will not know in advance. However, it is 
possible to estimate / adaptively: to choose an estimator /„, not depending 
on s or M, which nevertheless obtains the minimax rate over a range of 
classes C%M), 

sup EjWU - /IL = o ((Vlogn)-^/(2-+i)) . 
/eC»(M) ^ ^ 

Techniques for constructing such estimators include Lepskii's method (Lep- 
skii, 1990), wavelet thresholding (Donoho et al., 1995), and model selection 
(Barron et al., 1999). 

Of course, to make full use of an adaptive estimator /„, we must also 
quantify the uncertainty in our estimate. We would like to have a risk bound 
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Rn, depending only on the data, which satisfies ||/ — /n||oo ^ with high 
probabihty. Equivalently, we would like a confidence band, 

Cn:={feC{[0,l]):\\f-fJ^<Rn}, (1.1) 

containing / with high probability. To benefit from the adaptive nature 
of fn, we would also like the radius Rn to be adaptive, decaying at a rate 
(n/ log n)~'^/(^'^+-'^) over any class C^{M). 

Unfortunately, this is impossible in general (Low, 1997; Cai and Low, 
2004). The size of an adaptive confidence band must depend on the pa- 
rameters s and M, which we cannot estimate from the data: the function / 
may be deceptive, superficially appearing to belong to one smoothness class 
C*(M), while instead belonging to a different, rougher class. If we wish to 
proceed, we must place further conditions on /. 

Different conditions have been considered by Picard and Tribouley (2000), 
Genovese and Wasserman (2008), Gine and Nickl (2010), and Hoffmann and 
Nickl (2011). Of note, Gine and Nickl place a self-similarity condition on /, 
requiring its regularity to be similar at large and small scales; they then ob- 
tain confidence bands which contract adaptively over classes C^{M), where 
M > is fixed. Hoffmann and Nickl consider a weaker separation condition, 
which allows adaptation to finitely many classes C'^^(M), . . . , C**(M). 

The conditions in these two papers are qualitatively different. In Hoff- 
mann and Nickl (2011), the family of functions / under consideration at 
time n asymptotically contains the full model, 

k 

T ■.= [jC'^{M), < si < ■■■ < Sk, M > 0. (1.2) 

i=l 

The confidence bands constructed are thus eventually valid for all functions 
f & J-, although the time n after which a band is valid depends on the un- 
known /. The penalty for this generality comes in the nature of the adaptive 
result: the bands contract at rates for any / G C^^{M), but they 

do not attain the minimax rate n~^^^'^^~^^^ for / G C^{M), s {si, . . . , s^}- 
Conversely, in Gine and Nickl (2010), the bands attain the rate n~'^^^'^^~^^^ 
for any / G C^{M), s G [smim -Smax]- However, the family of functions con- 
sidered does not, even in the limit, contain the full model, 

-^max 

-F:= y C'iM), 0<Smin< W, M>0. (1.3) 

Instead, some functions / must be permanently excluded from consideration. 

We can describe this difference in terms of dishonest confidence sets. We 
say a confidence set C„ for / is honest, at level 1 — 7, if it satisfies 

limsupsupP/(/0C„)<7, (1.4) 
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where J- is the entire family of functions / we wish to adapt to (see Robins 
and van der Vaart, 2006, and references therein). Honesty is necessary to 
produce practical confidence sets; it ensures that there is a known time n, 
not depending on /, after which the level of the confidence set is not much 
smaller than 1 — 7. In contrast, a dishonest set satisfies the weaker condition 

suplimsupP/(/ Cn) < 7- 

While dishonest confidence sets are not useful for inference, they can provide 
a useful benchmark of nonparametric procedures. The bands in Hoffmann 
and Nickl (2011) are dishonest confidence sets for the full model (1.2); those 
in Gine and Nickl (2010) are not, for the model (1.3). 

In the following, we will show that this distinction is intrinsic: that 
the problem of adapting to finitely many Sj is fundamentally different from 
adapting to continuous s. We will construct confidence bands which are 
adaptive in the model (1.3), under a weaker self-similarity condition than in 
Gine and Nickl (2010); functions satisfying this condition may be considered 
typical members of any class C*(M). We will then show that our condition is 
as weak as possible for adaptation over (1.3), and that no adaptive confidence 
band can be valid, even dishonestly, for all of (1.3). 

We also provide further improvements on past results. Firstly, past 
constructions of adaptive confidence sets under self-similarity have required 
sample splitting: splitting the data into two groups, one for estimating the 
function /, and the other for estimating its smoothness. In the construction 
of our bands, we will show that this procedure can be avoided, leading to 
smaller constants in the rate of contraction. 

More importantly, in past results M is assumed known; in general, this 
assumption is required to obtain meaningful results. However, in practise, 
we will not know M in advance; we would much prefer to adapt also to the 
unknown Holder norm. We would thus like a confidence band which is valid 
even for the model 

M=0 S = Sniin 

In Gine and Nickl (2010), the authors suggest the standard remedy 
of undersmoothing: constructing bands valid for subsets of C^(M„), with 
Mn — )• 00 as n — )• 00. However, doing so not only incurs a rate penalty; it also 
gives a dishonest band. We will instead show that, under the assumption 
of self-similarity necessary for adaptation, we can perform honest inference 
without an a priori bound on M. 

We would therefore like to construct a confidence band for / G C'^(M), 
which: 

(i) is adaptive; 
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(ii) makes assumptions on / as weak as possible; and 

(iii) is honest simultaneously for a range of s, and all M > 0. 

Confidence sets C„ in the literature are often constructed to be asymptoti- 
cally exact, satisfying 

sup |P/(/0C„)- 71^0 

as n — )• cxD. We will show that, using an undersmoothed estimator, we can 
construct an exact confidence band, satisfying conditions (ii) and (iii), which 
is rate-adaptive up to a logarithmic factor. 

We will argue, however, that in this case exactness may be undesirable. 
Instead, we will construct an inexact confidence band, satisfying only (1.4); 
while we no longer know the exact level of our confidence band, this level is 
guaranteed to be at least 1 — 7. Our inexact band is centred at an adaptive 
Lepskii-type estimator, is asymptotically smaller, more likely to contain the 
function /, and satisfies all three conditions (i)-(iii). 

As our bands cannot rely on a known (or unknown) bound on the Holder 
norm M, their construction differs significantly from those given previously 
in the literature. We likewise describe new approaches to undersmoothing, 
and to linking the white noise model with density estimation and regres- 
sion. In each case, rather than assuming M is bounded, we must make 
fundamental use of the self-similarity property of our functions /. 

Our bands thus depend on self-similarity parameters e and p, which 
determine the functions / to be excluded. In this sense, they are no different 
than any other technique, whether fixing a class C*(M) in advance, or using 
one of the methods discussed previously. (The bands in Gine and Nickl, 
2010, do not require a choice of parameters to construct, but they are honest 
only over families which do; using them in practise would thus involve an 
implicit choice of parameters.) The advantage in our bands is that, while 
we must still exclude some functions /, we do so only where necessary for 
adaptation. 

The parameters e and p may in practise be set by domain-specific knowl- 
edge, or by convention, as is common with the confidence level 1 — 7 = 95%. 
Whether this is suitable for practical inference is a matter for further study. 
We leave the reader, however, with the words of Box: "all models are wrong, 
but some are useful." 

In Section 2, we describe our self-similarity condition, and in Section 3, 
we state our main results. We provide proofs in Appendices A-D. 

2 Self-similar functions 

To state our results, we must first define our self-similarity condition. We 
will need a wavelet basis of i^([0, 1]); for an introduction to wavelets, and 
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their role in statistical applications, see Hardle et al., 1998. We begin with 
Lp and V') the scaling function and wavelet of an orthonormal multiresolution 
analysis on L^(R). We make the following assumptions on Lp and ip, which 
are satisfied, for example, by Daubechies wavelets and symlets, with > 6 
vanishing moments (Daubechies, 1992, §6.1; Rioul, 1992, §14). 

Assumption 2.1. 

(i) For K gN, if and ip are supported on the interval [1 — K,K]. 
(a) For N €zN, ip has N vanishing moments: 



(Hi) if is twice continuously differentiable. 

Using the construction of Cohen et al. (1993), we can then generate an 
orthonormal wavelet basis of -^^^([0, 1]), with basis functions 



V'i.fc, J > Jo, A; = 0, . . . ,2^ - 1, 

for some suitable lower resolution level jo > 0. (See also Chyzak et al., 2001.) 
For k E [A^, 2^ — N), the basis functions are given by scalings of <p and ip, 

<fj,,k(.x) := 2^«/V(2^°x - k), := 2^/^i;{2^x - k). 

For other values of k, the basis functions are specially constructed, so as to 
form an orthonormal basis of i^([0, 1]), with desired smoothness properties. 

Using this wavelet basis, we may proceed to define the spaces C** over 
which we wish to adapt. Given a function / G L^([0, 1]), 





A; = 



..,2^°-l, 



and 




k 



j>jo k 



for s G (0, A^), define the norm of / by 




Define the spaces 



:= {/ GL2([0,1]): 11/11^. <oo}, 



and for M > 



C%M) := {/eL2([0,l]): 11/11^. <M}. 
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For s N, these spaces are equivalent to the classical Holder spaces; 
for s € N, they are equivalent to the Zygmund spaces, which continuously 
extend the Holder spaces (Cohen et al., 1993, §4). In either case, we may 
therefore take this to be our definition of in the following. 

We are now ready to state our self-similarity condition. Denote the 
wavelet series of /, for resolution levels i to j, i > jo, by 

j 

fid '■= ^^l3l,ki^l,k, 
l=i k 

and for i = jo, by 

fjoj '■= X] "fc'^io.fc + /jo+i,i- 

k 

Fix some Smax 

G (0,iV); for s G (0, •5max)) M > 0, £ £ (0,1), and p £ N, we 
will say a function / G C^{M) is self-similar, if 

WhpjWcs > eM y j > jo. (2.1) 

If s — Smax) we will instead require (2.1) only for j — jo. Denote the set of 
self-similar / G C"*(M) by Cq(M, e,/9); for fixed e, p, we will denote this set 
simply as Cq(M). 

The above condition ensures that the regularity of / is similar at small 
and large scales, and will be shown to be necessary to perform adaptive 
inference. To bound the bias of an adaptive estimator /„, we need to know 
the regularity of / at small scales, which we cannot observe. If / is self- 
similar, however, we can infer this regularity from the behaviour of / at 
large scales, which we can observe. 

Similar conditions have been considered by previous authors, in the con- 
text of turbulence by Frisch and Parisi (1985) and Jaffard (2000), and more 
recently in statistical applications by Picard and Tribouley (2000) and Gine 
and Nickl (2010). We can show that condition (2.1) is weaker than the con- 
dition in Gine and Nickl; we will see in Section 3 that it is, in a sense, as 
weak as possible. 

Proposition 2.2. Given Smin G (0,Smax]i ^ > 0, < 6i < 621 and ji > jo, 

there exist M > 0, e G (0, 1), and p G N such that, for any s G [smin, -Smax], 
the condition 

/GC^nC^-'"(6), &i2-^'^< ||/,-+i,oo|loo<^22-^'' Vi>ii, (2.2) 

implies f G Cg(M, e,/9). Conversely, given s G (0, Smax]; M > 0, e G (0,1), 
and p > 1, there exist f G Cg(M, e, p) which do not satisfy the above condi- 
tion, for any Sjam G (0, s], 6 > 0, < 61 < 62, and ji > jo. 
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In fact, we can show that self-similarity is a generic property: that the set 
T> of self-dissimilar functions, which for some s never satisfy (2.1), is in more 
than one sense negligible. Firstly, we can show that P is nowhere dense: 
the self-dissimilar functions cannot approximate any open set in C*(M). In 
particular, this means that T> is meagre. Secondly, we can show that P is a 
null set, for a natural probability measure vr on C^(M). We thus have that 
TT-almost-every function in C^{M) is self-similar. 

Proposition 2.3. For s G (0, Smax] <ind M > 0, define 

V:=C\M)\ U Co^(M,e,p). 

ee(o,i),peN 

Further define a probability measure n on f & C^{M), with f having inde- 
pendently distributed wavelet coefficients, 

ak ~ M2-^«("+i/2)f;Q_;L, 1]), fc ~ M2-^'(^+i/2)c/([-l, 1]). 

Then: 

(i) D is nowhere dense in the norm topology ofC^{M); and 

(ii) tt{V) = 0. 

These results are given for the self-similarity condition (2.2) in Gine and 
Nickl (2010, §3.5), and Hoffmann and Nickl (2011, §2.5); as a consequence 
of Proposition 2.2, they hold for our condition (2.1) also. We conclude that 
the self-similar functions may be considered typical members of any class 

3 Self-similarity and adaptation 

We are now ready to state our main results. First, however, we will require 
an additional assumption on our wavelet basis, allowing us to precisely con- 
trol the variance of our estimators. This assumption is verified for Battle- 
Lemarie wavelets in Gine et al. (2011); for compactly supported wavelets, 
the assumption is difficult to verify analytically, but can be tested with prov- 
ably good numerical approximations. In Bull (2011, §3), the assumption is 
shown to hold for Daubechies wavelets and symlets, with = 6, . . . , 20 van- 
ishing moments. Larger values of A^, and other wavelet bases, can be easily 
checked, and the assumption is conjectured to hold also in those cases. 

Assumption 3.1. The 1-periodic function 

attains its maximum at a unique point t^ G [0, 1), and (cr^)"(io) < 0. 
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We may now construct a confidence band which, under self-similarity, 
is exact, honest for all M > 0, and contracts at a near-optimal rate. We 
centre the band at an undersmoothed estimate of /: an estimate slightly 
rougher than optimal, chosen so that the known variance dominates the 
unknown bias (as in Hall, 1992, for example). This allows us to construct 
an asymptotically exact confidence band, although the larger variance leads 
to a logarithmic rate penalty. We state our results for the white noise model, 
which serves as an idealisation of density estimation and regression; we will 
return later to consequences for the other models. 

Theorem 3.2. In the white noise model, fix < ^ < 1, Smin £ (0, Smax], 
and set 

r„(s) := (n/logn)-"/(2s+i)jQg^^ y C^iM). 

There exists a confidence band := C^^(7, Smin, Smax; e, p) cls in (1.1), 
with radius R^, satisfying: 

(i) sup^g^|P(/ C-) - 7l ^ 0; and 

(a) for a fixed constant L > 0, and any s £ [smin, •Smax], M > 0, 
sup Ff (r'^ > LM^/^^'+^^rn{s)) ^ 0. 

We can do better by dropping the requirement of exactness. Intuitively, 
we may feel that an exact band should be preferable: given an inexact band, 
surely we can modify it to produce something more accurate? In fact, this 
is not necessarily the case. Consider a simplified statistical model, where 
we wish to identify a parameter G M, and have the luxury of observing 
data X = 9. The optimal confidence set for 6 is thus {X}, but this set is 
not exact at the 95% level. We can produce an exact set by adding noise: 
if Z ~ A^(0, 1), the confidence set 

{xeR:\X + Z-x\< $-^(0,975)} 

is exact at the 95% level. It is also clearly inferior. The perfect, inexact set 
is preferable to the imperfect, exact one. 

The situation is similar in nonparametrics. We can undersmooth, adding 
noise to produce an exact band, but in doing so we make our band both 
asymptotically larger, and less likely to contain the function /. In practise, 
this is clearly undesirable. Instead, we will give one of the main results of 
this paper: we will provide an inexact band, centred at an adaptive Lepskii- 
type estimator, which under self-similarity is honest over a larger family of 
functions, and exact rate-adaptive with respect to s and M. 
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Theorem 3.3. In the white noise model, fix < ^ < 1, and set 



se(0,Smax],Af>0 

There exists a confidence band := C^(7, Smax, p) «s in (1-1); w^^/^ 
radius R^, satisfying: 

(i) limsup„ supjgjr P(/ C^) < 7; anrf 

(a) for a fixed constant L > 0, and any s G (0, Smax], M > 0, 

/ . 2.Afi/(2s+i) \ 
sup P/ ( > _ ^ r„(5) I ^ 0. 



The constant in the above rate contains an extra 1/(2* — 1) term, which 
is present to allow for s tending to 0. Note that if, as before, we restrict to 
s > Smin > 0, we may then fold this term into the constant L, producing a 
rate of the same form as in Theorem 3.2. 

As is standard, the rates adapt only to smoothnesses s < Smax; if / is 
smoother than our wavelet basis, we cannot reliably detect this from the 
wavelet coefficients. However, our self-similarity condition (2.1) is weaker 
when s = Smax, and the class Cq^'^^^M) contains many smoother functions 
/; in this case we obtain the rate of contraction optimal for C*™'"'(M). 

Theorem 3.3 is, in more than one sense, maximal. Firstly, we can ver- 
ify that the minimax rate of estimation over Cq{M) is the same as over 
C^{M). Since any adaptive confidence band must be centred at an adaptive 
estimator, we may conclude that the above results are indeed optimal. 

Theorem 3.4. In the white noise model, fix < j < ^, s £ (0, Smax], 
M > 0. An estimator fn cannot satisfy 

limsup sup P/ (||/„ - /lloo 7, 
n feC^{M) ^ ^ 

for any rate rn = o ((n/ log n)"'^/^^*^"'^^) . 

Secondly, we can show that the self-similarity condition (2.1) is, in a 
sense, as weak as possible. In (2.1), the function / is required to have 
significant wavelet coefficients on resolution levels j growing at most geo- 
metrically. If we relax this assumption even slightly, allowing the significant 
coefficients to occur less often, then adaptive inference is impossible. 

For s G (0,Smax), M > 0, denote by Cf(M) the set of / G C"(M) 
satisfying the slightly weaker self-similarity condition, 

\\fj,Pjj\\cs >eMy j>jo, 
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for fixed e > 0, and G N, pj — t- oo. Even allowing dishonesty, and with 
known bound M on the Holder norm, we cannot construct a confidence band 
which adapts to classes (M). 

Theorem 3.5. In the white noise model, fix < j < ^, 0< s^in < Smax) 
and M > 0. Set 

r„(s) := (n/logn)-^/(2-+i), ^ := |J Cf(M). 

A confidence band Cn, with radius Rn, cannot satisfy: 

(i) limsup^Pj(/ Cn) < 7, for all f ^ T\ and 

(a) Rn = Op(r„(s)) under Ff, for all f G Cf{M), s £ (smin,Smax)- 

As a consequence, we firstly cannot adapt to the full classes C^{M). More 
importantly, we cannot, as in Hoffmann and Nickl (2011), obtain adaptation 
merely by removing elements of the classes C^{M) which are asymptotically 
negligible. In order to construct adaptive bands, we must fully exclude 
some functions / from consideration, and this remains true even when M is 
known. 

The difference between these problems lies in the accuracy to which we 
must estimate s. To distinguish between finitely many classes, we need to 
know s only up to a constant; to adapt to a continuum of smoothness, we 
must know it with error shrinking like 1/logn. The finite-class problem is 
in this sense more like the adaptation problem studied in Bull and Nickl 
(2011); the distinctive nature of the L°° adaptation problem is revealed only 
when requiring adaptation to continuous s. 

While the above theorems are stated for the white noise model, we can 
prove similar results for density estimation and regression. The following 
theorem gives a construction of adaptive bands in these models; other results 
can be proved, for example, as in Gine and Nickl (2010), and Bull and Nickl 
(2011). 

Theorem 3.6. In the density estimation model, let s^i^ G (OjSmax], or 
in the regression model, Smin G [|iSmax]- either model, the statement of 
Theorem 3.3 remains true, for the family 

and with constants L, L' depending on s and M. 
Acknowledgements 
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A Results on self-similarity 

We begin by establishing that our self-similarity condition (2.1) is weaker 
than (2.2), the condition in Gine and Nickl (2010). 

Proof of Proposition 2.2. We first consider the case s < Smax- 

Given (2.2), 

for j > ji, k £ [N, 2^ - N), we obtain 

and similar bounds for k £ [0, N)U[2^-N, 2^). We thus conclude / G C*(M), 
for a constant M > 0. 

We will choose e G (0, 1) small, p £ N large, so that pjo > ji, and 

C := M{e + 2'^f^<->-^'>) 
is small. If / Cq{M), we have j2 > jo such that 

|/3j-fc| < eM2-J'(^+i/2), 
for all j e [j2,pj2], ke[0,2^). Let js := max(ji,j2). Then 

(pj2 OO 

< M (e2-J3^ + 2-P^^') < C2-^^\ 

contradicting (2.2) for C small. Thus, given (2.2), we have M, e, and p for 
which / e C^{M). 

Gonversely, given s G (0, Smax], M > 0, e E (0, 1), and /O > 1, for i G N 
set ji := /0*jo, and consider the function 



/:=X:M2-^>(^+V2)^^^_^^^_, 
j=i 

in Q(M). We have 

oo 

ll/i„+i,oolL<M ^ 2-^-"^ < 2--'"+^^ = 0(2-^"^) 



i=n+l 



as n ^ oo, so / does not satisfy (2.2) for any Smin, bi, b2, and ji. As our 
self-similarity condition is weaker for s = Smaxj the same is true also in that 
case. □ 
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B Constructing adaptive bands 



To construct confidence bands satisfying tlie conditions in Section 3, we will 
use estimators given by truncated empirical wavelet expansions, 

/(in) := X^^fc'^io.fc + X] 

fc j0<j<jn k 

for the empirical wavelet coefficients 

Ofc := y (pjo,kit) dYt, Pj i,:= j tpj^kit) dYt. 

We will centre our bands on adaptive estimators f{jn), where the resolution 
level jn also depends on Y. 

We will consider several different choices of resolution level, correspond- 
ing to different properties of the function /, and the class C^{M) to which 
it belongs. We first consider the adaptive resolution choice j^'^, chosen in 
terms of the function /. Pick sequences j'™'^^ G N, jo < j™'" < j'™^^, so 
that 2-'"'" ~ (n/ logn)^/^'^^~^^\ and 2-'"'"' ~ n/ log re. Further define 

c„,^ := (n/(logn)^)-i/2^ 

and for k > 0, /i > 1, let 

:= sup ({j--} U {j-- < j < jT" : sup, \f3j,k\ > KCn,,}) . 
While j^'^ is unknown, we can estimate it by a Lepskii-type resolution choice, 

fj{^,f^) := sup (or"} U < j < jT" : sup, ,1 > kc„,^}) , 

which depends only on the data. Fix A > \/2, v > 1, and for convenience 
set j^'^ := jn'^{X, v)- If = 1, we will see f{j^) is then an adaptive estimator 
of /; if > 1, it is near-adaptive. 

While the above statements are true for general /, they do not provide 
us with an estimate of the error in /„. To produce confidence bands, we must 
estimate the smoothness of /, and this is where self-similarity is required. 
We will consider values of the truncated Holder norm, 

Ml, ■■= WAjWc-s, 

which measures the smoothness of / at resolution levels i to j. In a slight 
abuse of notation, set /Jj^^, := a,, and /Sj-g^, := a,. (Note that Pj^^k and /Sj^^k 
are otherwise undefined, as the wavelets ■i/'j,fc exist only for j > Jq.) We may 
then bound Mfj by the quantities 

M|,-= sup 2'(«+i/2)(|A,,|-V2c„,i)+, 

i<l<j,k 

Mly.= sup 2'^'+^l^\\k^i,\ + V2cn,i). 

i<l<j,k 
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and we will show in Appendix C that for j < j^^"^, Mfj G [Mlj, '^ith 
high probability. 

Set ji = pJq, j2 = [jn'^/p\, js = jn'^, and suppose n is large enough that 
'° > PJi, so jo < ji < J2 < J3- If / e C^{M) for s < Smax, then with high 
probability, 

W- ■ w ■ 

—3o,n jo,ji 

Assuming further s > Smim for some Smin ^ 0, we can lower bound s by 

Sn '■= inf({Smax} U {s G [s 

mill) '^inax 

):R{s)>e}). 

Since 

R(S) = 

is increasing in s, Sn can be found efficiently using binary search. 
Likewise, set 

M{s) := e-^l^^, 
and Mn '■= M(s„). With high probability, 

and as the LHS is decreasing in s, also 

M„2"^i(^"+^/2) > M2"^'i(''+^/2). 

Using these bounds, we can control the error in /, producing adaptive con- 
fidence bands for /. 

To construct the bands, we will introduce some more resolution choices 
jn- Firstly, we consider the class resolution choice j^, chosen in terms of the 
class C*(M). For k > 0, > 1, define 

fJ{K,fi) := sup ({if"} U {j > jVf - : M2-^(^+V2) > ^,^^^}^ 

= max [log2(MKc„,^)/(5 + i)J) , (B.l) 

which we can estimate by 

fJ{K,i2) := max (jf^, [log2(Mn/Atc„,^)/(s„ + • (B.2) 



Secondly, to produce exact confidence bands, we will need the under- 
smoothed resolution choice j^^. Fix Un G N, 2"" ~ logn, and set 
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defining similarly, in terms of . Fix < S < \pl small, let A := A + 
and A := A — \/2. For convenience, write j'^ := jn{^, 1), jn^ '■= in^(A) 1)? 
likewise j^', j^^'. 

We may now proceed to define our bands. Let 



a(j) := V21og(2)i, 
b{j) ■■= a{j) 



log(7rlog2) + logj - i log(l + v^) 



2a(j) 

x(7) :=-log(-log(l-7)). 



a(j) 

/(j) := max(i,min(j-^',i^^^)), 
R2{3) ■■= r^A(2'(-'')/2 - 2-'V2)c„,,/(l - 2^1/2)^ 

^ , .. f v,M„2-'(-'>~"/(2^" - 1) > 0, 
R3U) ■ 



[00, Sn = 0, 

where a^^ is given by Assumption 3.1, 

:= sup 2-(^o+i)/2 V|V;,„+i,fc(t)| = sup sup 2-^/''y2\^j^k{t)\, (B.3) 
te[o,i] J>io«G[o,i] 

and 



If we set Smin > 0, > 1, the undersmoothed resolution choice with 
confidence radius 

i?-:= i2i(jr,7), 

will be shown to give a band C^^ satisfying Theorem 3.2. If instead we set 
■Smin = 0, = 1, and define 

then the adaptive resolution choice j^'^, with confidence radius 

Rn ■■= RlOn, In) + R2{jf) + RsQn), 

will be shown to give a band C^'^ satisfying Theorem 3.3. 



15 



C Constructive results 



We now prove our results on the existence of adaptive confidence bands. To 
proceed, we will decompose the error in estimates f{j) into variance and 
bias terms, 

- /iL < - m\L + \\m - /iioo> 

where 

m := Ef[f{j)]=f,,,,. 
To control the variance, we will need the following result from Bull (2011). 

Lemma C.l. Let < 7„, < 70 < 1, and 7"^ = o(n~"), for all a > 0. Then 
as n ^ 00, uniformly in f £ L'^{[0, 1]), 



sup 



7n^F U(jn) I bUn)\ > x(7n) 



0. 



To bound the bias, we must control the estimators and M„. We 

will show that, on events with probability tending to 1, these estimators 
are close to the quantities they bound. 



Lemma C.2. Set := j-'^(A,zv), C ■= 3?f{K,^)- For s G [s„ 
M > 0, and f G Cq(M), we have events En, with ¥{En) — >■ 1 uniformly, on 
which: 

(^) i°^<]t<ln: 

(ii) Sn < s, and M„2-^'i(^"+V2) > m2-^i{s+i/2) . 
(Hi) Sn > Sn, and Mn < M^n', 
for sequences Mn, Sn satisfying 

Mn/M ^ £-1, log2(n)(s - Sn) ^ S, 

uniformly over f G Cq{M), with constant S > depending on N, e, p, and 
A. Also on En, for any < k < A + \/2, 1 < < i^- 

(iv) f^{K,iJi)>j!:f; 

(v) j^{K,p) < j^{ti,fi) < jn{K',li) + Jn{K,fi); and 

(vi) j^''{K,fi) < < Jnif^^fJ-) + Jnil^^l^y, 

for sequences J^'(k, J^^(k, /i) — )■ 25, uniformly over f G Cq{M). 
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Proof. For n such that j™™ < p'^jo, set £"„ := 0. Otherwise, let En be the 
event that 

2^ — 1 " I— 
sup SUp|/3j,fc - /3j,fc| < V2Cn,l, 

and if n is large enough that > j™™, also 

for j4, A;4 as follows: set j4 := j^*^, and choose k4 to satisfy |/3j4^fc4| > Xcn,u, 
which is possible by the definition of j^. Now, for x > 0, 1 — <I>(x) < (f){x)/x, 
so we have 

i=io k=o 

< (^logn)-i/2 (V2<5-in-^'/2^2^"^'=+in-i) 
= o((logn)-3/2) . 

(i) If i^'^ = i™''', then trivially j^"^ > j^'^. Otherwise, on En, 

and again jn'^ > j^'^. Similarly, for all < J < J™**^! 

|^j,fc| < |/3i,fc| + \/2c„,i < Ac 

n W ^ ~7&d 

so jr < • 

(ii) On En, we have 

for any i < j < j^^^- If s < Smax, by the argument given in Ap- 
pendix B, we then obtain 

Sn < s, M„2-^i(^~"+i/2) > ^2^-'i(^+i/2). 

If s = Smax, the results follow similarly, noting that s„ < Smax by 
definition. 

(iii) On En, js = jn^ < 3n^ < 3n{k, v), and for n large f^{\, v) > j^'"", so 

dn := Cn,i2r^^'+'/'^ < c„,,2^3(.+i/2) < M^-i^ 

and also 

en ■■= Cn,l2Ms+im ^ 0. 
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We then obtain 
for a sequence 

Rn £"^(1 + 2\/2A"^) =: R. 

On s„ < s < Smax by (ii), so if s„ = Smax, we are done. If not, 
then R{sn) > £, and 

i2-jl>br7pJ-il=:^n, 

we have 

S„ > S - log2 (e"^i?n)/5n — Sn, 

and since 5„ ~ log2(n)/p(2A^ + 1); 

log2(n)(s - sn) ^ />(2iV + 1) log2(e-^i?) =: S. 

Likewise, 

Mn < M{s) < e'^{M%j, + 2V2en) < e~^{M + 2\/2e„) < M„, 
for a sequence M„ > 0, with M^jM — )- e~-^. 

(iv) If = j^'"", then trivially ^) > j^-^. If not, on E^, for j = jf, 
we have some k such that |/3j,fc| ^ ^Cn,u- Hence 

and again jn{K,fi) > jf. 

(v) On En, by the above we have 

and so jn{K,fi) > jn{K,fJ^)- Equahy, from (B.l), (B.2) and the above, 
we obtain 

fr!{K,fl)-fr!{K,n) < 1 + 2 log2 (M„/M) + 4 log2 (^/^M//^) (s - s„) 
for a sequence J^'(k,^) — )• 25. 
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(vi) From (v), we also have 

for a sequence J^^{K,fj,) — )• 25". □ 
We may now bound, the bicis of f with, th.© Gstinicitors jVi? 

and M„, 

which bound the true parameters by the above lemma. 

Lemma C.3. Let jn > i^'^. On events En as in Lemma C.2, for any s G 

[Smin,Smax], M > 0, and f £ C^{M), 

||/0n)-/IL<«2(jn) + i?3(jn). 

Proof. If Sn = 0, this is trivial. If not, by Lemma C.2, on En we have 

jn > > f^, and for j > jn, M2-^(^+V2) < M„2-^("~"+i/2)_ ^j^^g 



\i=in + l j = l(j„) + l 

<R2{jn)+RM- □ 

We are now ready to prove our theorems. First, we consider the exact 
band . 

Proof of Theorem 3.2. 
(i) Define the terms 

(X 
c(i) ~ ^^^^ 

F{j) ■■= d{j,\\f{j)-f\U, 

G{j):=d{j,\\f{j)-fmoo), (C.l) 

H{j) ■■= d [j, \\mr„ui,oo - /(^•)j-+i,oo|L) • 

We will show that uniformly in j, F, G and H are close, and H is 
independent of j^, so we may bound F{j^) by Lemma C.l. 

By definition, s„ > Smin > 0, and j^^ > jn{A, 1) ^ Jn > so on the events 
En, by Lemma C.3, 



\FijT)-Gifnn\<^R3ij'n 



< 



cOr) ^ ~ V 2^" - 1 

(j:'(A,l)log(n))"'""" = o(l) 



Jn 



J^'(A,i) 
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sincei^'(A,l) >in", and 
Similarly, for jn > j^^, on E^, 

-rad 

\G{jn) - H{jn)\ < E 2^'^' ^Mkk - Pj,k\ 

<0r/j^'(A,l))^/^2-0"'(A,i)-J:^)/2 
< 2-On (A,i)-j;i'(A,i^))/2 ^ ^^^^^ 

since 

i^'(A, 1) - f^{\, v) > log2(log(n)) ^ oo. 

On depends only on jdj^j^ for j < jff < j'^, and H{j) depends 

only on Pj^^ for J > Jn"') so -ff(j) is independent of j^^. Hence, given 
x,e > 0, for n large, and any j > j^^, 

¥{F{j) < X I En,fn" = j) > P(G(i) <x-e\ = j) 

>P(i^(j) <^-2e| i?n,ir =J) 
= P(i^(i) < X - 2e I 

> F{G{j) <x-3e\En) 
>¥{G{j)<x-3e)-¥{E'n) 

> exp (-e-(^-3^)) -o(l). 

Likewise, 

P(F(i) > a; I EnJT = J) < exp (-e-(-+3-)) + o(l). 

As these results are uniform in j > and true for any £ > 0, we 

have 



sup 



{F{j) > X I = j) - exp (-e--) | ^ 0. 

On En, we have ^'^^ > jl"", so 

oo 

p(i^(jr) < ^ I £^n) = nF{j) < X I i?n, jr = mfn = j i En) 

3=3^ 

oo 

= (exp (-e--) + o(l)) ^ P(jr = J I ^n) 
= exp (-e"'^) +o(l). 
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Since P(ii'n) — > 1, we obtain P(F(j°^) < x) — > exp (— e ^) , and rear- 
ranging, 

As the limits are all uniform in /, the result follows. 

(ii) Let J^^ := J^^(A, 1), so on En, < + by Lemma C.2. For n 
large, j.^' > j™'", so 

/ s/r \ l/(2s+l) 

2:'nV2 ( ^ ) , 2^-^/2 log(n)2^" /2, (c.2) 

VCn,l/ 

and 

As P(-E'n) — 1 uniformly, and the limits are uniform over / G Cq{M), 
the result follows. □ 

We now move on to the adaptive band C^°'. As the variance term is no 
longer independent of j„, we must use a different method to establish the 
validity of our band. We will instead consider j™^^ — j™™ + 1 confidence 
bands, one for each possible choice of j„, and show that the effect of this 
change is asymptotically negligible. 

Proof of Theorem 3.3. 

(i) Let G{j) be given by (C.l). From Lemma C.l, we have 

noO^) > x(7n)) < P (3 i G [jr°, jT^] : GU) > x(7„)) 

-•max 
Jn 

< P(G(j)>x(7„)) 

J Jn 

= (jr^-jr" + i)(i+o(i))7n 

= 7 + 0(1)- 

Rearranging, we get 

P - /(jf )lloo > RlifJ,7n)) < 7 + 0{1). 

By Lemma C.3, on the events En, 

\\f{fJ)-f\L<R20f) + R3Q?f) 

and by Lemma C.2, ¥{En) — )• 1. Since 

11/ - /(j-f )IL < WfOf) - fOn )\\oo + WfOn ) - /Hoc: 

we obtain 

P(/0Cf)<7 + o(l). 
As the limits are uniform in /, the result follows. 
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(ii) Since jj^*^ > j™™, and x(7„) = O (log log n), we have that -Ri(j^'^,7n) is 
dominated by b{jf)cCjf). Let J^' := J^'(A, 1), so on En, jf < 3^ < 
Jn + by Lemma C.2. For n large, > j™™, so by (C. 2), we 
obtain 

^i(in',7n) < ^j-' + J^'2(^"+^")/2n-i/2 < M VC^^+D r„ (.) . 
Likewise on for n large j,^' + < J™^^, so l{jn'^) = j^, and 
^2(3:'^) < 20"'+^"')/2c„,i < Mi/(2^+i)r„(.). 

Also for n large, 5,^ > s„ > 0, so 

M , /lfi/(2s+i) 
R^f^ad) < ^^^^ 2-^"'" < —— -r (s) 

As P(-E'n) — )• 1 uniformly, and the limits are uniform over / E Cq(M), 
the result follows. □ 

Finally, we prove our result on confidence bands in density estimation 
and regression. 

Proof of Theorem 3.6. We can prove the result analogously to Theorem 3.3. 
To bound the bias term, we will sketch a version of Lemma C.2 for the 
density estimation and regression models. It is possible to also adapt the 
variance bound Lemma C.l, as discussed in Bull (2011, §2); however, we 
will provide a weaker bound, as a consequence of our lemma. 
Consider the empirical wavelet coefficents 



" 1=1 ^ 1=1 

in density estimation, or 

^ n 1 " 

"fc := - ^ ^jo,k{xi)Yi, /3j- fc := - ^ Vi,fc(2;j)^i) 

i=l ^ i=l 

in regression. To prove the lemma, we must find an event En on which, with 
high probability, these estimates are close to the true wavelet coefficients q^, 
j3j^k- In density estimation, we use Bernstein's inequality, noting that, for 
j > jo, ^ ^ [-^1 ^-^ — N), the empirical wavelet coefficients satisfy 



n 

with similar bounds for the other coefficients. 



22 



The regression model is often identified with the white noise model, for 
/ in classes C*(M), s > ^ (Brown and Low, 1996). In this case, however, we 
wish to consider functions with unbounded Holder norm, so we must discuss 
regression explicitly. To control the empirical wavelet coefficients, we use a 
Gaussian tail bound, noting that for j, k as before, 

^ i=l ^ i=l J 

For j < j™^^, as n — 7- oo, the mean and variance are thus 

(3j,k + 0{n-'/^\\f\\c^/2) and ^^^-^(l + o(l)), 

uniformly. Again, similar results hold for the other coefficients. 

We thus, in both cases, have events En comparable to those in Lemma C.2, 
but with bounds on wavelet coefficients now depending on the unknowns 
11/11^ and 11/11(71/2. We will bound them with statistics 

T:=C\\f{n)\\cs.^^^+D, 

for constants C, D > 0. In density estimation, for C, D large this satisfies 

supP/(r< 11/11^)^0, 

and likewise in regression, 

supP/(r< ii/ii^i/O ^0. 

In either model, for s E [smin; Smax 

1, M > 0, 
sup Ff{T > CM + D + 1)^0. 

We may thus replace or ||/||(7i/2, with T in the above, obtaining an 

analogue of Lemma C.2 which holds for all / G 

We therefore obtain a bound on the bias term, as in Theorem 3.3. To 
bound the variance term, we note that on the event En, we have 

\\f{jn)-f{jn)\L = 0{2^"/'cn,l), 

uniformly in all j„ < j^^^; we may then proceed as before. □ 
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D Negative results 

We now prove our negative results. First, we will need a testing inequality 
for normal means experiments, arguing as in Ingster (1987). We will prove 
a modified result, which controls the performance of tests also under small 
perturbations of the means. 

Lemma D.l. Suppose we have independent observations Xi, . . . , Xn, and 
Yi, 12, . . . , and we wish to test the hypothesis 

Ho : Xi,Yi N (0,1), 

against alternatives 

Hk{u) : Xi ~ N{fi6,k, 1), 1^. ~ N{ui, 1), 

1 1 2 9 

for k = 1, . . . , n, and fi, Vi G M, ||z^|| < ^ . Let T = if we accept Hq, or 
T = 1 if we reject. There is a choice ofk, not depending on v, for which the 
sum of the Type I and Type H errors satisfies 

¥ho{T = 1)+ inf PH,(.)(T = 0)>l-n-i/2(eM^_ 1)1/2 1)1/2. 

Proof. Consider first the case = 0. The density of ^h^.{o) w.r.t. Fhq is 

Zk := e^^^-'^'/^. 

Let Z := n-^ Ylk=i ^k- Then Ei^^Z = 1, and E^o^^ = 1 + n-^{e^'^ - 1), so 
^H,{Z - if = Ya^THoZ = n-\e>'' - 1). 

We thus have 

n 

¥h,{T = 1) + maxP^^.(o)(r = 0) > ¥h,{T = 1) + n"^ ^PH^(o)(r = 0) 

fe=i 

= 1 + Eho[(^-1)1(T = 0)] 

> l-VarHo(^)'/' 

= l_n-i/2(e/^^_ 1)1/2. 

Fix k maximizing the above expression, and consider a hypothesis Hk{v) 

1 1 2 9 

with ||z^|| < ^ . The density oiW^^i^^^ w.r.t. ^^^(o) is 

Z' ■- 1'^ ^ 

and similarly we have 

E^^(0)(Z' - \f = y^TH,io)Z' = ell^ll' - I. 
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Thus 



p^,„(r = i)+p^^(,)(r = o) 

= ¥ho{T = 1) +Ph,(o)(T = 0) +E^^(o)[(Z' - 1)1(T = 0)] 

> ¥h,{T = 1) +PH,(o)(r = 0) - Var^,(o)[Zf /2 
>l_n-i/2(eM^_ 1)1/2 1)1/2^ 

As this is true for all ||z^|| < ^ , the result follows. □ 

We may now prove our result on minimax rates in Cq(M). For / G 
C*(M), the argument is standard (see, for example, Tsybakov, 2009, §2.6.2), 
but we must check that we can construct suitable alternative hypotheses 
lying within the restricted class Cq(M). 

Proof of Theorem 3.4- Suppose such an estimator /„ exists. For i > 0, set 
ji+i := pji + 1, and consider functions 

oo 

fo ■= Pjo^jofi + X] l^n'^hfl, fk ■= fo + (3j'ipj,k, 

i=l 

where /3j := M2"^(*+V2)^ j > to be determined, and k £ [N,2^ - N). 
By definition, these functions are in Cg(M). By standard arguments, fn 
must be able to distinguish the hypothesis Hq ■ f = fo from alternatives 
Hk ■ f = fk, contradicting Lemma D.l. □ 

Finally, we will show that the self-similarity condition (2.1) is as weak 
as possible. 

Proof of Theorem 3.5. We argue in a similar fashion to Theorem 3.4, taking 
care to account for the dishonesty of C„. Suppose such a band Cn exists. For 
m = l,2,...,oo, we will construct functions fm which serve as hypotheses 
for the function /. We will choose these functions so that fm G C^'"(M), for 
a sequence Sm G (smim Smax) with limit Sqo £ (smirn -Smax)- We will then find 
a subsequence n.^ such that, for 5 := |(1 — 27), 

inf Py^(/oo0C„„.)>7 + 5, 

contradicting our assumptions on C„. 

Taking infimums if necessary, we may assume pj increasing; for i > 0, 
set ji+i := pjji + 1. Then for m = 1, 2, . . . , 00, set 

00 m 
fm ■= bo,mVO + X bi^rnlpi + ^ b'llp'i, 
i=l 1=1 
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where 

and bi^rn, h\ G M, z/ G N, and A;; e [N, 2^* - A/") \ {2J<-i} are to be determined. 
We will set — 1 = io < ^1 < ■ ■ ■ ) 

_ fM2--?<(^i+V2)^ < z < i^+i for some Z < m, 
''"^ \M2-^-^(^'"+V2), i>i^, 

and 

h'l ■- M2~-'*i(^'+^/^). 

Set 

So := Smax, — Sm-1 " (j^^^ " Jj^Vl) log2(e"^), m > 0, 

^0 := Smin, tm ■= Sm - jj^+i logaCs"^), m > 0, 

and choose n large enough that: 

(i) ti > to; 

(ii) for i> ii, the V'i are interior wavelets, supported inside (0, 1); and 

(iii) the set of choices for ki is non-empty. 

By definition, Sm is decreasing, tm increasing, and Sm — tm \ 0. For m > 1, 
both sequences thus lie in (smin, Smax), and tend to a limit s^o G (smin, Smax)- 
For all m = 1, 2, . . . , GO, Z G N, and ii < i < ii+i, 

_/y'2^-^^(*''+-'-/^) > £M2~-''*-*'+i"'~^/^^ > £M2~-''^*'""'""'"/^\ 

so indeed fm G Ci'"(M). 

We have thus defined /i, making an arbitrary choice of fci; for conve- 
nience, set ni = 1. Inductively, suppose we have defined fm-i and n^-i, 
and set r„ := r„(sTO_i). For rim > ^m-i and D > both large, we have: 

(i) Ff^_,{fm^,^CnJ<7 + S;and 

(ii) Ff^^,_,i\CnJ>Drr,J<5. 

Setting r„ = 1 (3 / G C„ : 11/ - /m-ilL > ' we then have 

< 7 + 2(5. (D.l) 

We claim it is possible to choose fm and nm so that also, for any further 
choice of functions fi, 

\\foo-fm-i\L>2Drnm, (D.2) 
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and 

I'/oo(7;^™ =0) >l-7-3<5 = 7 + 5. (D.3) 
We may then conclude that 

iP/^ (/oo CnJ > P/^ (r„„ = 0) > 7 + <5, 

as required. 

It remains to verify the claim. Letting im ^ oo, choose Um so that 



(D.4) 



for D' > to be determined. Now, 

oo / 



ti+i 



l=m 



i=il+l 

H + l 



< 



"Ji ^min 



i=m \ i=ii+l 
oo 

< 9 \ ■ 9-J'*min 



2 5: 2- 

i=iim + l 
2l~J»m + l*n 

1 - 2-«n.i 



SO, for im large. 



fm-l - /oolloo > IIO- 



m"fm II oo 



> M 



> MU\\^ I 2-J"»^' 



> iM||^|L2-^-' 



l=m+l i=im.+l 

2l-J»m + l«m 



1 - 2- 



We have thus satisfied (D.2), for a suitable choice of D' . 

To satisfy (D.3), we will apply Lemma D.l, testing Hq : f = fm-i against 
Hi : f = /oo. The observations Xi will correspond to j V'm(*) '^^t-, for all 
possible choices of km-, and the Yi to the other empirical wavelet coefficients. 
From (D.4), 
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so the quantity 



and likewise 



oo / + l 

J=m \ i=i(+l 
= 0(1). 

Thus, for im large, 

(2i>™ - (2Ar + l))-V2(eM^ _ i)i/2 + (^^e _ i)i/2 < ^_ 

Hence by Lemma D.l, if we take im large enough also that (D.l) holds, then 
(D.3) holds for a suitable choice of km, and our claim is proved. □ 
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